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WEAK CONVERGENCE OF THE EMPIRICAL SPECTRAL 
DISTRIBUTION OF ULTRA-HIGH-DIMENSIONAL 
BANDED SAMPLE COVARIANCE MATRICES 


By Kamil Jurczak 
Ruhr- Universitdt Bochum 

In this article we investigate high-dimensional banded sample 
covariance matrices under the regime that the sample size n, the 
dimension p and the bandwidth d tend simultaneously to infinity 
such that 

n/p —>■ 0 and 2d/n —>■ j/ > 0. 

It is shown that the empirical spectral distribution of those matri¬ 
ces almost surely converges weakly to some deterministic probability 
measure which is characterized by its moments. Certain restricted 
compositions of natural numbers play a crucial role in the evaluation 
of the expected moments of the empirical spectral distribution. 


1. Introduction. In statistics, high-dimensional sparse sample covariance 
matrices naturally occur as regularized estimators of population covariance matrices 
in high dimensions provided most entries are known to be zero or close to zero, cf. 
Bickel and Levina (2008a), Levina and Vershynin (2012). Statistical properties of 
these type of estimators had been intensively studied in recent years. Let us just 
mention some few crucial contributions. El Karoui (2008) provided a consistent 
estimate under the spectral norm for certain sparse sample covariance matrices 
based on thresholding. Lam and Fan (2009) studied the rate of convergence of 
estimators for sparse covariance matrices and precision matrices based on penalized 
likelihood. Cai and Zhou (2012) determined the minimax rates for sparse covariance 
matrix estimation under various matrix norm losses over appropriate classes of 
covariance matrices. As a special case of sparse covariance matrices arise banded 
covariance matrices. For the latter, it is a priori known that the non-zero entries 
do not lie too far from the diagonal. Bickel and Levina (2008b) investigated a 
regularized estimator for banded covariance matrices and its rate of convergence. 
Qiu and Chen (2012) proposed a test for bandedness. 

Apart from statistics, sparse sample covariance matrices are applicable in models 
of physical systems, where most particles do not interact with each other, see Bai 
and Zhang (2007). Despite this rich occurrence, there is not much known about 
the spectral properties of high-dimensional sparse sample covariance matrices as 
compared to the classical high-dimensional sample covariance matrices. Under some 
slight regularity assumptions Bai and Zhang (2007) have proved that the empirical 
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spectral distribution of 



converges to the semicircular law as d/n ^ 0 and p, d, n —>■ oo, where the entries of 
Xr^ = {X,k,u) ik € are independent, centered random variables with variance 

cr^ > 0, the symmetric matrix Dp = (Dij^p) G is independent of with 

= d + o(d), and o denotes the Hadamard product. In particular, the 
case, that Dp is a deterministic 0-1-sparsity mask with d non-zero entries per col¬ 
umn, is covered by this model. The assumption djn ^ 0 is crucial for their result. 
On the contrary, an intrinsic consequence of the investigation in this article is that 
for d/n —>■ 2 / > 0 the limiting spectral distribution of a sparse sample covariance - if 
existent - does essentially depend on the structure of the sparsity mask through the 
number of certain restricted compositions of natural numbers. However, the focus 
in this article lies on the special case of banded sample covariance matrices. For 
those we prove that their sequence of empirical spectral distributions almost surely 
converges weakly, where the limiting distribution is described by its moments. 

In contrast to banded or sparse sample covariance matrices, Wigner matrices with 
an additional sparsity structure have been extensively studied. Let us just mention a 
few contributions. Bogachev, Molchanov and Pastur (1991) proved under slight reg¬ 
ularity conditions that the empirical spectral distribution of sparse Wigner matri¬ 
ces converges weakly to the semi-circular law. Benaych-Georges and Peche (2014a) 
showed that its largest eigenvalue converges to 2 in probability and that eigenvec¬ 
tors corresponding to eigenvalues far enough from zero are delocalized if the number 
of non-zero entries per row is of larger order than (logiV)®*^^+“\ where N is the 
number rows of the random matrix and the parameter a depends on the tails of 
the underlying distribution. Further, Benaych-Georges and Peche (2014b) studied 
localization and delocalization of eigenvectors for heavy-tailed band Wigner matri¬ 
ces. In an extraordinary article Sodin (2010) investigated the limiting distribution 
of the smallest and largest eigenvalues of band Wigner matrices. 

The article is structured as follows. In the rest of this section we introduce the ba¬ 
sic notation, recall some useful results, and summarize the method of moments. In 
Section 2 we compile some combinatorial tools to evaluate the expected moments 
of the spectral distribution of banded sample covariance matrices. The concept of 
ordered trees with a d-band structure on the J-line is introduced, and an expansion 
for the number of these so-called d-banded ordered trees with a fixed number of 
vertices is given by means of restricted compositions of natural numbers. Finally, 
Section 3 is devoted to the main result concerning the almost sure weak convergence 
of the spectral distribution of banded sample covariance matrices and its proof. 

1.1. Preliminaries. We denote the ordered eigenvalues of a symmetric matrix 
A G by Ai(A) > • • • > Xp{A). Then, the spectral distribution of A is the 

normalized counting measure on the eigenvalues of A 


p 
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where 5x is the Dirac measure on x. We write 


v) = inf |£ > 0 /i((—oo, x — e]) — e < v{{—oo, x]) 

< ^((—oo, x + e\) + e for all x e m| 


for the Levy distance between two probability measures /i and v. Moreover, we will 
also use frequently the Kolmogorov distance 


dK{^J-,v) = sup |^((-oo,a;]) - z/((-oo, a;])|. 

Recall the basic relation v) < dK{f^, We abbreviate the set ,p}, p S 

N) by [p]- For any subset N C [p]^ the matrix A = Ijv S has entries 

Aij = where In ■ [p]^ —>■ {0,1} is the indicator function on N. For 

^ •= {(bj) G [p]^ • I* ~ j\ ^ d} we define Id := In- For an expression f{p,d,n,l) 
we write Oi{g{p, d, n, 1)) if there exists a positive function h such that /(p, d, n, 1) < 
h{l)g{p, d, n, 1) for all p, d, n, 1. 

Let us recall some useful results to bound the Levy distance between the spectral 
distributions of two symmetric matrices A,B G 


Theorem 1.1 (Theorem A. 43 of Bai and Silverstein (2010)). Let A and B he 
two p X p symmetrie matrices. Then, 

(1.1) (p'^, p^) < - rank(A — R), 

V / p 

where and denote the speetral distributions of A and B, respectively. 


Theorem 1.2 (Theorem A. 38 of Bai and Silverstein (2010)). Let Ai, ... ,Xp 
and Si,.. . ,Sp be two families of real numbers and their empirical distributions be 
denoted by p and p. Then, for any a > 0, we have 

1 ^ 

(1.2) dl'^\fi,fl) <min-^|Afc-(5^(fc)|“, 

where the minimum is running over all permutations n on {1,... ,d}. 

Corollary 1.3 (Corollary A.41 of Bai and Silverstein (2010)). Let A and B 
be two dx d Hermitian matrices with spectral distribution and p^. Then, 

(1.3) dl{p^,p^) < {{A-B){A-Br). 

1.2. Method of moments. The method of moments is a tool to deduce weak 
convergence of a sequence of measures and goes back to Tchebycheff (1890). Wigner 
(1958) was the first to apply this technique in random matrices for the purpose of 
establishing the weak convergence of a sequence of expected empirical spectral 
distributions of Wigner matrices to the semi-circular law. The foundation of the 
method of moments is the following statement. 
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Theorem 1.4 (Moment convergence theorem). Let iJ,n,n € N, be probability 
measures on the real line with finite moments uin^r '■= f a^’'d/x„(a:), r € N. Suppose 
thatmr = lim„_>oo Wn,r exists for every r G N. Then, there exists a probability mea¬ 
sure p with moments mr, r G N. Moreover, if p is the unique probability measure 
with moments mr, r G N, then the sequence (/i„) converges weakly to p. 


Proof. For arbitrary e > 0 let G N be sufficiently large such that 
|wn ,2 ~ W 2 I < £ for all n > N. 


Then by Markov’s inequality, 


/[ ic\ ^ ^n,2 ^ 

P'n\[~X, x\ ) < 2^,2 — 


TO2 + £ 
2a;2 


for any x > 0 which implies that the sequence (/i„) is tight. Hence, by Prokhorov’s 
theorem for any subsequence there exists a subsubsequence pnk, converging 

weakly to a probability measure p. We show that the moments of p are given by the 
sequence (mr), r G N. Thereto, let Xuk^ ~ Mnt, and X ^ p,. By the convergence 
of all moments, the sequences {Xf)n are uniformly integrable and therefore 
is integrable, and —>• EAT’’ = mr for all r G N. Now suppose that p is 

the unique measure with moments m^, r G N. Then each subsequence (pur) has 
a weakly convergent subsubsequence {pn^, ) with limit p. This implies the weak 
convergence of (/r„) to p. □ 


The question, whether a sequence of moments m^, r G N, uniquely determines 
a measure p on the real line, is partially answered by Carleman’s condition which 
says that p is the only measure with moments mr, r G N, if 

00 

E -l/(2r) 

m^r = oo- 

r—1 

This condition is satisfied if the moments do not grow too fast. In particular, thereby 
all probabilty measures with sub-exponential tails are determined by their moments. 
In the main theorem of this article the limiting spectral measure has even finite 
support. 

2. Combinatorial tools. In this section we introduce some basic combina¬ 
torial objects which will be useful to prove the convergence of the expected moments 
of the empirical spectral distribution of a banded sample covariance matrix. 

2.1. Walks on ordered trees. A (finite simple) graph G = (V, E) is a pair of a 
finite vertex set P 7 ^ 0 and an edge set if C {e C P : |e| = 2} such that P n if = 0. 
G = (P, if) is a subgraph of G if P C P and if C FI. 

Let G P be the vertex of a graph G = (P, if). The vertex v is called incident with 
an edge e G if if G e. The number deg(u) of edges incident with v is the degree 
of V. A vertex v' is said to be a neighbor of v if {v,v'} G if. If P = Pi + P 2 such 
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that for any edge e G E holds enVi = enV 2 = l, then G is a bipartite graph with 
parts Vi and ¥ 2 - 

A walk of length n — 1 on a graph G = {V, E) is a sequence of vertices vi,... ,Vn, 
n gN, such that {vi, r’i+i} G E for all i G [n — 1]. The vertex vi is the start vertex, 
Vn the end vertex and V 2 , ■ ■ ■, Vn-i the inner vertices. We say a vertex v is visited by 
a walk vi,... ,Vn if u S {ui,..., Vn}- An edge e G E is crossed by a walk vi,... ,Vn 
if e = {vi,Vi+i} for some i G [n—1]. We say the path ui,... crosses (resp. visits) 
an edge e G E (resp. a vertex v GV) at step fc if e = {vk,Vk+i} (resp. v = Vk+i). 
A walk is closed if the start vertex and the end vertex coincide. Further, a walk 
vi,... ,Vn on a graph G visiting each edge at most once is a path. A circle is closed 
walk vi,... ,Vn,vi, where ui,..., is a path. 

A graph G = {V, E) is said to be connected if for any pair of vertices v,v' G V 
there exists a path on G from the start vertex v to the end vertex vh A (connected) 
component of G is a graph G with V <G V and E = E n {e <G V : |e| = 2} such 
that G is connected and for any two vertices v G V and z; e V \ V there does not 
exist a path from u to z; in G. In other words, the components of a graph G are the 
maximal connected subgraphs of G. 

A connected graph G = {V,E) is called a tree if for any edge e G E the graph 
(V, E \ {e}) is not connected. That is, there is exactly one path from v G V Xo 
v' G V for any two vertices v,v' G V, and trees are free of circles. Moreover, it 
is well-known that a connected graph on n € N vertices is a tree if and only if it 
contains exactly n — 1 edges. 

A rooted tree is a pair {G,Vroot)i where G = {V,E) is a tree and Vroot G is a 
designated vertex of G called the root of G. There is a natural partial ordering on 
a rooted tree. We write v <g v' if v is visited by the path with start vertex Vroot 
and end vertex v'. Clearly, the root satishes Vroot <G v for all v G V. A neighbor 
v' G V of a vertex v G V with v <g v' is a child of v and v is the parent of v'. 



Fig 1. Example of a rooted tree G with <a V 2 <g v^ <a z>5 and zii <a V 2 <a z'r- 


A vertex v ^ Vroot bas deg(z;) — 1 children and one parent. If the children of each 
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vertex v ^ V are equipped with a total order <„ then G is called an ordered 
tree or plane tree. The last name is justified because there is a natural embedding 
of the graph into the plane by drawing the children of a vertex increasing from 
left to right. An ordered tree may be associated with a closed walk vi,..., ^ 21 ^ 1-1 



Fig 2. Example of two different ordered trees: The order on the children of Vi in the left 
graph is < V 2 '' < whereas the depiction of the right graph implies V 2 < < 113 ^^ 

“around the tree” defined by the following inductive procedure. The root Vroot is 

the starting vertex vi of the walk. Let he the vertex visited at the {k — l)-th 

ffel (k) 

step and % ... ' G V its children. If the walk has already visited the vertex 

for some i < I but not the vertex then Vk+i = otherwise Vk+i is the 
parent of Vk respectively Vk is the end vertex of the walk if Vk = Vroot- It is easy 
to see that the walk crosses all edges of the tree once in each direction and hence 
the procedure stops right after 2\V\ — 2 steps at the root. On the other hand, let 
vi,... ,V 2 n-i be a closed walk on G = {V,E) with V = {ui,... ,V 2 n-i}, \V\ = n, 
and E = {{vi,Vi+i} : i G [n — 1]} such that each edge e G E is crossed at least 
twice by the walk vi,... ,V 2 n-i- Since |ii^| < |y| — 1 and G is connected it holds 
\E\ = \V\ — 1. Hence, G is a tree. Let Vroot '■= vi be the root of G. Then, for each 
vertex v G V oi the tree there is a natural order on its children induced by the 
increasing sequence in which they have been visited by the walk for the first time. 
On a fixed vertex set V = {ui,..., u„} this defines a bijection between ordered trees 
on V and closed walks in V crossing an each edge at least twice. Subsequently for 
an ordered tree G this walk is called the canonical walk on G. 

Two ordered trees G and G' with vertex set V and V are isomorphic if there 
exists a bijection tt ■. V ^ V such that vi,... ,V2\v\-i is the canonical walk on 
G and 7r(ui),... ,7r(u2|y|_i) the canonical walk on G'. The mapping tt is called an 
isomorphism. Let tt be an isomorphism from G to G', then the following properties 
are satisfied: 

1. G and G' have the same number of vertices and edges. 

2. Let v,w GV. Then, {u, ic} is an edge of G if and only if {7r(u), 7r(w)} is an 
edge of G'. 
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Fig 3. Example of a canonical walk on an ordered tree: The walk starts in the root and 
runs clock-wisely. 


3. Vi is the root G and 7r(ni) the root of G’. 

4. Let V, w be two vertices of G. Then, v <a w on G if and only if Tr{v) <a tt{w) 
on G'. 

5. Let be be a vertex of G, and wi and W 2 two of its children. Then, wi <y W 2 
if and only tt^wi) <Tr(v) ^(u’ 2 )- 

Each tree G = (V, E) is a bipartite graph. To see this, fix some vertex v G V. 
Then, define Vi := {v’ G V ■. The path v,... ,v' has even length} and V 2 := {v' G 
V : The path i;,..., u'has uneven length}. The sets Vi and V 2 are well-defined 
since on a tree there is exactly one path with start vertex v and end vertex v'. 
If v',v'' G Vi or v',v" G V 2 then it holds {v',v'’} ^ E, since for any two vertices 
v', v" £ V with {v', v”} G E the length of the paths v,... ,v' and v,... ,v'' differs 
by 1. So, either v,... ,v' or v,... ,v'' has even length, and therefore V = Vi + ¥ 2 - 



Eig 4. Depiction of a tree via its two parts. 




Let p, n G N and refer to the set {(i, 1) : i G [p]} as the I-line and {(fc, 0) : /c G [n]} 
as the K-line. Subsequently we will only consider ordered trees G = (Vi + V 2 ,E) 
such that the part Vi containing the root of G is a subset of /-line, whereas V 2 
is subset of the K-line. We will usually identify the elements on the /-line and on 
the K-line with its first component where a label i always refers to a vertex on 
the /-line and / to a vertex on the //-line. Moreover, we adopt the usual order 
on the natural numbers to the /-line as well as to the //-line. Let Gp^n,i+i be the 
set of ordered rooted trees on Z -|- 1 vertices such that the part containing the root 
lies on the /-line and the other part on the K-line. We denote an ordered tree in 
Qp,n,i+i by G(f, fc), where i = {ii,... ,ii) G [p]* and k = (fci ,... ,ki) G [n]* such that 
ii, fci..., b, fc/, *1 is the canonical walk around the ordered tree G{i, k) with root ii. 



Fig 5. Depiction of an ordered tree with canonical walk 
ii, fci, * 2 , fe, * 3 , fc 2 , 14 , fcs, 14 , ^ 2 , « 2 , fci, *1 via its two parts on the I- and K-line. 

An essential quantity to evaluate the l-th expected moment of a (classical) high¬ 
dimensional sample covariance matrix is the number of ordered trees in Gp^n,i+i- 
The usual approach to count the number of graphs in Gp,n,i+i is to subdivide 
Gp,n,i+i into isomorphy classes and then to count the number of graphs in each 
isomorphy class. Let, G(f, k) G Gp,n,i+i be an arbitrary ordered tree. Note that an 
isomorphism tt from G(i, k) to an isomorphic graph G{i', k') preserves the parts. 
Hence, we split tt into its restriction to the vertices on the /-line and //-line denoted 
by tt/ : {i'l,... ,i'i} and ttk : {ki,...,ki} {k[,... ,k'i}. Among 

the graphs in the isomorphy class [G{i,k)] there is one graph G{i‘^,K) which is 
called the canonical representative of [G(z, fc)] defined as follows. The enumeration is 
equivalent on both parts. Therefore, we restrict to the part on the /-line. Let = 1 
and 1 < r < Z. If |{ii,.. .,ir}\ = |{*i, ■ ■ ■ ,V-i}| + 1 , then = |{ii,.. + 1, 

otherwise there exists an index s < r such that v = is and we define i'^. = i's- Indeed, 
the graphs G{i,k) and G{i'^,K) are isomorphic and the canonical representative 
does not depend on the choice of the ordered tree G{i,k) G [G{i,k)]. A canonical 
representative of a equivalence class is also called a canonical ordered tree. For 
Z -I- I < n V p the number of equivalence classes does only depend on I but not on p 
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and n. Now, let G(i°, k'^) be a canonical ordered tree. The number of ordered trees 
in fc'^)] is given by the product of numbers of bijections from {k^,..., kf} into 

subsets of the /-line and bijections from {zi,..., ii} into subsets of the //-line. Both 
latter quantities depend only on the number of vertices on the /-line r -|- 1 and are 
explicitly given by 

p\ n\ 

(p-(r + l))! [n-(l-r)V 

Hence, two isomorphy classes have the same cardinality if their canonical represen¬ 
tatives have the same number of vertices on the /-line. For fixed r < I this rises the 
question how many canonical ordered trees have r -|- 1 vertices on the /-line. It is 
well-known (see e.g. Lemma 3.4 in Bai and Silverstein (2010)) that the answer is 


r -I- 1 



Alltogether, the number of ordered trees in Gp,n,i+i is given by 



p! _ 

(p- (r-h 1))! {n 


n\ 


Now, let us consider ordered trees with I + 1 vertices which have a band structure 
on the /-line. This new concept will be helpful to evaluate the expected moments of 
banded sample covariance matrices. We say an ordered tree G(z, k) G Gp,n,i+i is d- 
banded (on the I) if the multi-index z = (zi, ..., q) satisfies |zs—Zs+i| < d for any s = 
1,..., Z with Zi+i = zi. Denote the subset of all d-banded ordered trees in Gp,n,i+i 
by Bp^n,d,i+i Subsequently, we assume that d > I and p > 2ld. The cardinality 
of Bp^n,d,i+i is crucial to evaluate the expected moments of the spectral measure 
of band sample covariance matrices in high dimensions. Bp^n,d,i+i has the same 
number of canonical ordered trees as Gp,n,i+i, however the (asymptotic) number 
of isomorphic ordered trees to a canonical ordered tree does not only depend the 
number of vertices on the /-line but on the set of degrees of the vertices on the 
//-line. The later statement is investigated in the next subsection. 


2.2. Restricted compositions and the number of d-banded ordered trees. A 
basic tool to evaluate the number of graphs in Bp^n,d,i+i isomorphic to a canonical 
graph G{i^,k^) are compositions of natural numbers. 

Definition 2.1. For any n G N, a tupel (ai,...,af;) G N^, k G N satisfying 
ai + ■■■ + Ok = n is called a k-composition of n. If a set A C zs designated 
and (oi,..., Ofc) G A, ai -I- • • • -I- Ofc = n, then we name (oi,..., ak) a restricted 
k-composition. For the special case A = {1,..., m}^, m G N, define F{n, k, m) as 
the number of the corresponding restricted k-eompositions of n. 


The values F{n^ k, to), n,k,m G N, may be determined by the method of gener¬ 
ating functions and are explicitly given by 


F{n, k, 


j=0 


n — jm — 1 
k-1 
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see Abramson (1976). Now, let G{i^,k^) S Bp^n,d,i+i be an canonical ordered tree. 
The aim of this subsection is to express in terms of the numbers 

F{deg{k:)d, degik^J, 2d), k^ e {kl,k^- 


Lemma 2.2. Let G{F,k‘^) S Bp^n,d,i+i with r + 1 vertices on the I-line be 
a canonical ordered tree and [G(i^, k‘^)] the class of isomorphic ordered trees in 
Bp^n,d,i-\-i ■ Then, 

\[G{F,k‘=)]\=pn'‘~^ F(deg(fc*)d,deg(fc*),2d) 


Proof. Let r + 1 be the number of vertices on the /-line. Each pair 


(tt/, TTif), TTj : [r d- 1] ^ [p], ttk ■■ [I - r] ^ [n] 


of injective functions with |7r(zg) — tt {i's+i)\ < d corresponds to an ordered tree 
G{i, k) e k^)] with is = 7r(zg) and fcg = T^ikg), s = 1,... ,1, and vice versa. As 
in the classical case there are possible choices for the mapping ttk- The 

evaluation of the number of permissible mappings ttj is more involved. First we 
have p possible labels for the root i\ of the tree. For simplification we consider only 
labelings with Id < ii < p — Id. This reduces the number of permissible labelings by 
d'^'^^) but ensures that we do not have to take into consideration labelings 
close to the boundary of the /-line. Now, the remaining vertices on the /-line of 
the ordered tree are labeled by induction on the vertices lying on the //-line. For 
simplification of the presentation assume that k = k^ ,and let k* = 1,— r] 
run over the vertices of the ordered tree on the AT-line. Assume that the children 
of the vertex k* on the //-line is already labeled for all fc* < s, s < Z — r. Then 

fs) (s') 

for k* = s we label its children in the following way. Let z)' <s • • • <« i\fg(^s)-i 

(s) 

the ordered children of s. The parent Zparent of ® i® already labeled by the inductive 
procedure. By the definition of the canonical walk, a labeling of the children does 
not violate the d-band structure on the /-line if and only if 


I ■('*) 

7 ' ' 

I ‘'parent 


I ^ d, — i^'^ I < d,..., 

Iz^®^ -z^®^ l<d 

rdeg(s) —2 ^deg(s) —ll — ’ rdeg(s) —1 


I 

parent | 


< d 


and each label in [p] is assigned to at most one already labeled vertex on the I- 
line. Now we simplify this problem without essentially changing the number of 
permissible labelings. First note that rejecting the second condition changes the 
number of labelings of zj"*^ <s ••• <s *deg(s)-i f’y Then for this 

reduced problem, it is equivalent to evaluate the number of solutions to the equation 


deg(s) 

(2-1) E = 0 restricted to \bt\ < d for alH = 1,... ,deg(s), 
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Fig 6. We label simultaneously the vertices The vertices on the I- 

line “above” those vertices have not been labeled at this point. 


since ht is associated with where 4*^ := 


) := parent- Let Qt = h + d 

for all t = 1,..., deg(s). Then, the number of solutions to (2.1) is the same as to 
the problem 


deg(s) 

(2-2) E at = deg{s)d restricted to at = 0,..., 2d for alH = 1,..., deg(s). 

t=i 


The numbers of solutions to (2.2) and to 

deg(s) 

(2-3) E at = deg{s)d restricted to at = 1,..., 2(i for alH = 1,..., deg(s) 

t=i 

differ by Putting all those labelings together proves the claim. □ 


3. Main result. Now we are able to state and prove the main result of this 
article. 


Theorem 3.1. Let be a sequence of random matrices G 

]^px" with independent entries with mean 0 and variance 1. Additionally, 

suppose that for any rj > 0, 


(3.1) 


p n 

^EE® 


(|Erf i{ 


X 


(pm) 

i.k 


> rj^n 


0. 


Denote 
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Then, the sequence of empirical spectral distributions X]r=i 

almost surely converges weakly to a measure fj,, as n ^ oo or p ^ oo, while d —>■ oo, 
^ 0, and ^ 2 / > 0. The l-th moment mi, I G N, of the limiting spectral 

distribution is given by 

Gk* j —0 

where the outer sum runs over all canonical ordered trees G{T^, k‘^) with 1 + 1 vertices 
such that the part containing the root lies on the I-line and the other part on the 
K-line, and the product runs over all vertices k* G {fcj,..., k^}. 


Note that by bounding the quantities T(n, k, m) by ^ 
of Theorem 3.1 


mi < 


‘t^hOC/y 


it follows from the proof 


The right hand-side is the l-th moment of the Marcenko-Pastur distribution with 
parameter y which implies that the random variable X fi is bounded in absolute 
value by (1 -I- y/y)"^ since 


P(|^| > a:) < 


J.21 



1—^00 


0 


for any a:: > (1 -|- yty)^. Beyond that, so far there is nothing known about the 
distribution p. Especially, the natural questions, for which y > 0 the random vari¬ 
able X is negative with positive probability, and whether the bound (1 -I- ^/y)^ 
on the support is sharp, are open. The answers to both problems are essential 
groundwork to understand the asymptotical behavior of the extreme eigenvalues of 
high-dimensional banded sample covariance matrices, cf. Bai and Yin (1993) for the 
almost sure limits of the extreme eigenvalues of high-dimensional sample covariance 
matrices. 


Proof. For ease of notation write X and S instead of and Ac¬ 

cordingly, the entries of X and S are denoted by Xik and Sik- Since in the situation 
of the theorem almost sure convergence for p —>■ 00 is a stronger statement than for 
n —>■ 00 , we restrict the proof to the case p —>■ 00 , while ^ —>■ 0, and ^ p. First, 
we choose a sequence Pn 0 such that 


1 

p^np 


p n 

EE® 


i—l k—1 


v(p,n) 






0 , 


and pn > 15^1 n > 2. As in the proof of the Theorem 3.10 in Bai and Silverstein 
(2010) we start with the step of truncation, centralization and standardization 
which allows to work with a simplified matrix afterwards. Whereas the arguments 
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-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 


Eigenvalues 


Fig 7. Histogram of the eigenvalues of a 5000 x 5000 sample covariance with bandwidth 
100 based on 600 samples from the standard normal distribution 



-1 0 1 2 3 4 5 6 


Eigenvalues 


Fig 8. Histogram of the eigenvalues of a 5000 x 5000 sample covariance with bandwidth 
300 based on 300 samples from the standard normal distribution 
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for truncation at the level rjny/n may be transferred almost analogously to the ma¬ 
trix S, the arguments for centralization and standardization need to be refined. 
Then, the convergence of the expected moments of the empirical spectral distribu¬ 
tion is shown by means of Lemma 2.2 which is crucial at this step. Finally, we have 
to prove that the fluctuation of the moments of the empirical spectral distribution 
almost surely converge to zero. This may be done similarly as in Bai and Silverstein 
(2010) for Wigner matrices. 


3.1. Truncation, centralization and standardization. Let X be the matrix 
with entries Xik ■= Xik"^{\Xik\ < rjn\/n\ and S be the matrix defined by the right 
hand side of (3.2), where X is replaced by X. Then, 


(3.3) 

dK < 

(3.4) 

< 

(3.5) 

< 


rank (S — 


P 

- rank 
P 

r, P " 




(^X - x'^ X' + X (^X - x'^' 


{|Xifc| > rjnVn} , 


2=1 A;=l 



where Theorem 1.1 is used in the first line, and the last line follows by subadditivity 
of the rank and the fact that X — X has no more than 


p n 


EEi {\Xik\ > TlnVn} 


2=1 k—1 

non-zero rows. Hence, it remains to prove that 

^ p n 


(3.6) 


P 


EEi {\Xik\ > riny/n} ^ 0 as oo. 


2=1 k—1 

Analogously to page 27 in Bai and Silverstein (2010) we have by (3.1) 

^ p n 


(3.7) 
and 

(3.8) 


^EE El > r]ny/n} = o(l) 

^ i=l fc=l 


Var UEE 1 {\Xik\ > VnVn} I = 
\P i=i k=i / 


such that by Bernstein’s inequality and the Borel-Cantelli lemma for any e > 0 


p n 


lim sup - EE 1 {l-Aifel > T]ny/n} < 


e a.s. 


P i=i k=i 


(3.9) 
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Therefore, 

(3.10) 


dK F®) ^ 0 as p ^ 00 . 


Redefine X by JC and S by S. Now, we prove that we may recenter the entries of 
the matrix X. We have 

(3.11) dl 

(3.12) < 2dl 

(3 13) _|_ 2d| ^(i(^-E^)(^-EX)')oij^ 

By Corollary 1.3 for the term (3.13) holds 


(3.14) 

(3.15) 

(3.16) 


(3.13)^^ < 


3/2 ^ 2-\/2 


tr 


-(EX (EXO)ol, 


2 /2 P / " \ 

E EXifcEXifc l{|z - j| < d} 


< 


pn 

^ \k^l 

2V2{2d+l) 


0 , 


where the last line follows by the inequality |EXifc| < To evaluate the 

term (3.12) we combine Corollary 1.3 and Theorem 1.1. Thereto, we prove first that 
there are not to many rows i in the matrix X which suffice 


(3.17) 


P / \ 

^(^XfeEX.J l{\^-J\<d}>^. 

J Vn 


j—1 \k=l 

By the union bound and Markov’s inequality, 


(3.18) 

(3.19) 

(3.20) 

(3.21) 


^i=i \fe=i 

p n 


^ p IE E 1 - ji < rf} > ^ 

ij—lk—l 


/ 


+ 1 


\ 


^ ^ ^ ' XifejXifejEXifcj^EXjfejl {|* j| ^ d} > g 


j=l fel,fe2 = l 
\ fcl#fc2 




< 




Y . UYrk =^ xlm ,, f ^{\^- J \< d } 


n 
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4r;iOE 


(3.22) 

(3.23) 


Sfci5.^fc2 {l* j\ — 


H- 

^ 2(2d+l)r?3 8r;6(2d+l)^ 

_ ~l~ o 


2+4+4 i - 

Vi Vn. 


Thus, by Hoeffding’s inequality for sufficiently large p 

\ 2 


P I P / ^ 


E1 ^ E (E ] 1 {I* - j I < d} > ^ ^ j 


i—l I j—1 \k—l 


< 


E 1 E i{i*-ji<d}> 


n 

45 


i—l \ \ j—1 \k—l 

( P / n 


-“|E ^{\i-j\<d}>^ 


> p Vn^- 


Mr]n 16?7^(2d+l)" 


^ I o ( /— '^dpn lQpl{2d+lf 
< exp -2p y/rjn - 


n 




The last line is summable over p. Hence, by the Borel-Cantelli lemma 

\ 2 


(3.24) l{\i-j\<d}> 


P \ P / n 


P i=i I j=i \k=i 


Let X £ be the matrix with entries 


0 as p —>■ 00 . 


(3.25) 


Xik '■= Xikl < ^ [ y] Xik^Xjk ] 1 {|* — j\ < d} 


< 


j=i \k=i 

Then, we obtain 
(3.12) < Adi 

+ Adi ^(s^^'+s^>EX'+i(EX)x')ol<j 

(s(' 


< 4 - tr 

\P 


- XEX' + (EX)X o 1 


2/3 


+ 4 ( - rank 
.P 


(X-X) EX'+ EX (X-Xyj ol 
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By (3.24) the summand in the last line vanishes asymptotically almost surely, 
whereas for the first term we have 


(3.26) 

1 

-tr 

{-( XEX' + (EX)x'^ 

P 

\n \ / 

(3.27) 

4 P / ^ 

^ ij=l \k=l 

So, 



(3.28) 

dL 



4 

nr]^ 


Subsequently denote X — E.X' by X. It remains to standardize the matrix X. In 
fact, we do not standardize all entries of X but those with 


(3.29) 


<^ik ■= > Vn- 


In particular, by condition (3.1) only o{np) entries do not satisfy (3.29). Without 
loss of generality we may assume that either 


n < 


P 


(logp)2 


or n > 


P 


{logpy 


holds on the whole sequence. Let S be the matrix with entries Su := Su, i = 1, ■■■,P, 
and 

1 " 

Sij ■= — 'y ) d'ji, Xif^Xji.l{\i — jj < d}, i ^ j. 


k=l 


where 


^ik •— ^ik^n ■— ^ T ■ 

First consider the case n > p/{\ogpY. Define 

p n 


EE -1) - Ex^k){x]k- i{i*-ji< d} 

i,j—l k—1 

9 J’ ” / ,2 

^^■■=—^EE - 1) - Exffc) -j\<d} 

k—1 

^ p n 2 

^^■■=—2 EE- 1) ^xlw.x%^{\^-j\<d} 

i^j—1 k—1 
^ P n 

k '■= Ej Ej {^ikidjki - l) {y^ik2^jk2 ~ XikiXjk^Xik2Xjk2 


i,j=l ki,k2 = l 
i¥=j ki^k2 






18 


X 1{|* - j| < d}. 


By Corollary 1.3 we have 

. p / n \ 

4 ^ E E -1) - j\ < d} 

^ i,j=l \k=l / 

= Ii + I 2 + I 3 + li- 


First note that by (3.1) and by the inequality < 1 the term /a satisfies 


^ p n 

I 3 < - 2 E E “ CTikC^jkf 1{|* - j\ > d} 

i,j — l k—1 
p n 

1{I* - j\ > d} 


pn^ 


i,j = l k—1 


= EXf, (1 - EX|fc)) 1 {|* - j\ > d} 


pn 


Ad 


i,j — l k—1 




pn^ ^ 

^ 2=1 fc = l 


Let £ > 0 and p G N sufficiently large such that Is < s/4. Then, 

P (dL (f®, F®) > e) < P (/i > I) + P (/2 > I) + P (/4 > 1) ■ 

We will use Markov’s inequality to bound each of the three probabilities on the 
right hand side. Denote Yik := Xf^, — EXf/, and observe that 

(3.30) E\Y,kr m € N. 


Then for p sufficiently large, 


p p p p n 

EK.I‘<;m E E E E E 

" 22,42 = 1 ®3,i3 = l »4,i4 = l fcl,fc2, 

*l#il *2^42 *35^43 *45^44 fc3,fc4 = l 


^YlYtikiYj.kMl^i - ji\ < d} 

Z=1 


< 


p p p p n 

;)m E E E E E 

*1.41=1 *2,42 = 1 *3,43 = 1 *4,44 = 1 fc = l 
*1/41 *2^42 *3/43 *45^44 


EjlYi^kYj^kHlii -ji\ < 4 

Z=1 


p p p p n 

+ rjSn^pi E E E E E |E^*ifcir;'ifci^i2fci42fcil 

” *1,41 = 1 *2,42 = 1 *3.43 = 1 *4,44 = 1 fel,fc2 = l 

*15^41 *25^42 *3/43 *45^44 ki^k2 
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X |EL^3fc2^'3fc2^4fc2^4fc2 I n l{|i; -ji\< d} 


1=1 






4 

jk 


iJ—1k—1 


y y P P n 

E E E E E 


Vn'n°P‘ 


iljl = l *2,i2 = l *3 J3 = l 14 J4 = l k=l 
*2#j2 i3#i3 *4#j4 

3< I {il ,12 .*3 .*4 Jl ,i2 ,i3 ,j4 } I <4 




iife 


;=i 


12 


E E E 


2 

12^2 


il,jl = l 12,12 = 1 fcl,fc2 = l 
ii/ii *2 5^12 ki^k2 


< c 


77^ ?7^ 1 

/n I /n I 


where C > 0 is an absolute constant. Further, 

O P " 


E|/2|" < ;£f E E lEV,, 

ki^i2k2^izkz ^4^4 I 


U5*2,'i3,^4 = l fci,fc2,A;3,fc4 = l 
p n 40 ^ 


- 7?8n8 ^ r/8n8 ^ ^ EV.^^^EF, 

i=l fc = l 11,42 = 1 fc=l 

4l#i2 


2 

*2^2 


and, 


48 

778778 


E E EVi^fc^EV, 


2 

^2^2 


^1,^2 — 1 fci,A;2 = l 
ki^k2 


^ 32p ^ 96p^ ^ 96p^ 


Vln ^ 774775 774^4 


E|/4p < 


1 




'yyyy ^Xij^kiXj-^kiXi-^k2^jik2 


4l,il=l *2,12 = 1 fcl,7*2 = 1 7*3,7*4 = 1 
*l#ll *2#l2 7*15^7*2 7*35117*4 

X "^*2 7*3 "^12 7*3 "^*2 7*4 "^12 7*4 


77877^774 


4 

- 778772 ■ 


E E EX^^,^EX2,^EX^,^EX2 


2 

1i7*2 


*i,li = l 7*1, 7*2 = 1 
7i#li 7 * 15117*2 


Each of the three expressions is summable over p. Therefore, it holds that almost 
surely for p sufficiently large dL{F^, F^) < e. This implies dL{F^, F^) —>■ 0 almost 
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surely as p —> oo. 

The arguments used here for n > p/{\ogpY are also applicable to the general case. 
However, this requires to evaluate the fourth moments of Ii and I 2 more carefully 
and to deduce an appropriate bound for E|J 4 |‘^. Possibly, the following arguments 
are more suitable for the case n < p/(logp)^. 

The essential idea is to cover the band structure of the matrix by a composed 
block structure, and then to exploit the independency of the submatrices oi S — S 
corresponding to a single block structure. Thereto, define index sets 

= {2kd—2d+k, 2kd—2d+k-\-l,... ,2{k + l)d—2d+k}, k = 1, ...,\p/ (2(i+l)J, 

and 

= {2kd—d+k, 2kd—d+k+l ,..., 2{k+l)d—d+k}, k = 1 ,..., l{p—d)/(2d+l)\, 

Note that at most 2d rows in the lower right corner of the matrix might be not 
covered by the composed block structure. 



Fig 9. Band structure covered by two block structures 


Let € IR( 2 d+i)x( 2 d+i)^ ^ _ lp/{2d + 1)J be the submatrices of S 

and S corresponding to the indices x Analogously, define the matrices 
e M( 2 <i+I)x( 2 d+ 1 ) fQj. ^ ^ ..., [{p - d)/{2d + 1)J . Then it holds for any 

Z< l,...,[p/(2d+l)J, 


(( 


(3.31) Etr( )< 








fc—1 


(3.32) 


n n 
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and analogously for I = 1 ,[{p — d)/{2d + 1)J 


((^ 


Etr ( ( 




n n 


iGE 


(2) k=l 


We conclude by condition (3.1), inequality (3.32), and Markov’s and Hoeffding’s 
inequality for any £ > 0 and p sufficiently large. 


/Lp/(2d+l)J 

(S 


{ Y, 


^ {2d + l)iS > E ^\_p/ (2g? + 1)J 


/ Lp/(2fi+i)J / r / / X 2\ ^ 

E (l l)e} 

-El|tr(^(sp^-5p^)^^ > (2d+l)e 


< 




i—1 k—1 


Lp/(2d+l)J 




£ (^l|tr(^(5p^-5p)) ) >(2d+l)£| 


(0 


-(1) o(l) 


-El^tr( (S') ) >(2d+l)£j> ) > -[p/(2d+l)J 


< exp — 


2 

e p 


4(2d+ 1 )) ' 
and accordingly, 

'L(p-d)/(2<i+l)J 


E j > (2(i+1)£| > £b/(2d+1)J 


< exp — 


2 

e p 


4(2d+ 1 )) ■ 

Combining Corollary 1.3 and Theorem 1.1 yields for p sufficiently large, 
dL{F^y^') 

'"/(2d+l)J r / 2\ 

l|tr(^(5, -SP) j > (2d+l)£ 


M 4d + 2 
< — + 


p p 


+ 


Ad+ 2 
P 


l(p-d)/{2d+l)\ 


E * •' 

7 — 1 ^ ^ 


1 tr (S, -Sf) >{2d+l)e 
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+ 


+ 


Lp/(2d+l)J 


^ l|tr(^(5f-5f^) ^<{2d+l)eYr(^[st-Sf) ) 


L(p-d)/(2d+l)J 


E 

1^1 


l^tr(^(sr-5f)') <(2d+l)£ 


<( 


X tr ( ( S', - S, 


1/3 


^ 3£ “t- 3£ 

with probability not larger than 
(3.33) 2exp(- 


2 


4(2d +1) y ■ 


Note that the first term in the bound on dL{F^, F^) occurs by removing the rows 
and columns from S and S which are not covered by the block structures. The 
second and third term treat the blocks which are removed from S and S for ir¬ 
regularity. Finally, the last term bounds the Levy distance between the spectral 
measures of the reduced matrices. The terms (3.33) are summable since 


n < 


P 


(logp)2 


As a consequence, 


d-L 



0 


almost surely as p —> oo. As before, redefine S by S. It remains to rescale the 
diagonal entries of S. Therefore, let S G have the same off-diagonal entries 
as S and 

1 " 

Sii = — ^ (f-j, Xjj,, f = 1, ...,p. 

k=l 

Here, we may use similar arguments as for the rescaling of the off-diagonal entries 
but we choose a = 1 instead of a = 2 in Theorem 1.2. By the Lidskii-Wielandt 
perturbation bound (1.2) in Li and Mathias (1999), we have 


p , , P n 

•£ A.(S) - A.(S) < -IIS - S|U. = - E E ("S’ -1) 

i=l p ^p i=l k=l 


Furthermore, for each / = 1, ...,p holds 



fc=i 


1) EXf, < 1 - 
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and therefore by Markov’s and Hoeffding’s inequality together with (3.1) for p 
sufficiently large, 


P f n 


\i—l Kk—l 


1 - 1) Xfk >£n\> spj 


<p(f 

\ 2=1 


a-1 


-El 


en 




^ £P 


<r \ 

< exp —— 


Now, Theorem 1.2 and Theorem 1.1 yield 

di(F®,F®) <2e + V^ 

with probability 


exp - 


9 

e p 


Again, by the Borel-Cantelli lemma 

dLiF^,F^)^0 


almost surely as p —>■ oo. 

Subsequently, we may assume that the matrix X has the following properties: 

1. All entries Xik are centered. 

2. All but o{pn) entries are standardized and if an entry Xik is not standardized 
then ^Xff. < rin- 

3. All entries of X are bounded by y/r]nn, where Pn iO with pn > 

Finally, we replace the non-standardized entries of X by Rademacher variables. 
First, define X = {Xikl-{&X'ff. = 1})^^. By an analogous line of reasoning as in the 
rescaling step follows 

(11 (^F^, 0 a.s. as p — oo, 

where 

xx'^ou. 

Now, let X e have the entries Xik = Xik^i^Xf). = 1} + eikl{EXf^. < 1}, 

where e^fc, i = 1, ... ,p, k = 1,... ,n, are independent Rademacher variables and 
independent of X. Moreover, define 

O Irf 
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Again by Corollary 1.3, 
(3.34) 


(3.35) 


< i tr f (S-S^ 


= —^ ^-AifcAjfc — AifeAjfc j l{|i —j|<d} 


pn 


2,i = l \fc=l 


2 

< - 

piT? 


p /n 

s i:(^ 

ij—l \k—l 


Xik — Xik\Xjk \ l{\i — j\<d} 


+ ( - Xjk) X,k ] 1{N - j\ < d} 


Kk=l 


pn- 


pn^ 


(3.36) 

(3.37) 

(3.38) 

For line (3.36) we have 


p/n 

xn 


^ik ^ik ) ^ik 


\k—l 

P / n 




i,j = l \fc=l 


+ [x,k - Xjk) X,kj 1{N - j| < d} 


„ P / n ^ P n 

(3.36) = s-EEifExi<i}^o. 

^ i=i \fc=i / ^ i=i k=i 

The terms (3.37) and (3.38) are handled the same way. Therefore, we iust consider 
(3.37). Rewrite 


2 ^ / \ 2 

(3.37) = ^ E E -J\<d} 

ij = l k=l 

n P " 2 

+ ii^EE - E-?i.) i{|i - ii < <i) 

i,j—l k—1 

2 P «■ 

^ ^ ' y ' ~ Xik^ Xjk^ (^Xik2 ~ Xik^ Xjk2^{\i — j\ E d} 

i,i=lfci,fe2 = l 
fcl#fc2 


Denote the first term by /i, the second by / 2 , and the third by I^. Ii vanishes 
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asymptotically since 

p n 

• ^ik -^ik 


pn^ 


EE(- 

i,j—l k —1 


2 Zl/V ^ 

1{EXB, < 1} ^ 0. 

i=i k=i 


m—l^m—1 


EY,i < 2 rj^-^n 


Let Yik := X'^^. — As in inequality (3.30) we bound 

(3.39) 

Then we obtain for I 2 , 


e4 < 


2 


E 


'y ] I®^'lfel^'2fe2^'3fe3^'4fc4 I 


il J2,t3,i4 = l ki,k- 2 ,H,ki = l 


98j4 P " 


p'^n^ 


980 j4 P 

" ' EE 


j=i fc=i 


p'^n^ 


2 

ife2 


283 ^-^ 

p'^n^ 


p n 


3 = \ fei,fe2 = l 
ki^k 2 

283^4 




jl.i2 = l k=l 
01^32 


p*n° 


32k2 


31,32 = 1 ki,k 2 = l 
3i¥^ 32 ki^k 2 


2 ^dSl 283(iS: 


p'^rY 


p^rY 


p^rY 


+ 


9 A ’ 


The last line is summable over p. Thus, by the Borel-Cantelli lemma, I 2 —>■ 0 almost 
surely as p —> 00 . Now, consider I 3 and note that X — X and X are independent. 
Again, we evaluate the fourth moment 
28^4 P 


E4 < 


^ p'^rd 


E E 

i=i fei,...,fcs=i 

fc2i-l#fc2i 


283^4 ^ ^ 


p'^rY 


^Xjki Xj k2 Xj fcj Xjk^^ Xjk^ Xj kg Xj kj Xjk, 

® Aji fci Xj-^k2 Xjiks Xj-^ki ® A'j2 kg Xj2 kg Xj^^kjXj^ks 


31,32 = 1 ki,...,ka = l 
31 Y 32 k2t-i^k2i 


98j4 P " 

<^E E 


p^rY 


3 = 1 ki,...,ks = l 


EXjki Xj k2 Xj kg Xj kg Xjkg Xj kg Xj kj Xjkg 


2403^4 ^ ^ 


p4^8 


EXlkgEXlk2^XlkgEXlkg 


31,32 = 1 kx,k 2 ,kg,kg = l 


284140(i4 2 8N 2403(^4 

< - {^ + rin + ril + vl) + 


p^rY 


p'^n'^ 


where 4140 is the 8-th Bell number and gives the number of partitions of {1,..., 8}. 
As for I 2 , we obtain /a —)■ 0 almost surely as p ^ 00 . 

In what follows, we may assume that the entries of X are centered, standardized 
random variables bounded by Pni/n for some decreasing sequence (? 7 „) converging 
to 0. 
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3.2. Almost sure convergence of moments. We use the method of moments 
to prove the almost sure weak convergence of the sequence . First we prove the 
convergence of the expected moments of . Let I e N and ij+i := ii, and define 


nip,i ■= J xAf^^x). 


Then, we conclude 


Emp.i = -Etr 


(3.40) 




pn' 




For a multi-index (ii, /ci, 12 , ^ 2 ,..., q, A:;, ii), let G = (V, E) be the graph with vertex 
set V = {ii,... Ai\ + {fci,..., fc;}, where the vertices ii,... ,ii are supposed to lie 
on the /-line and ki,... ,ki on the //-line, and edge set 


E = {{zi,fci},{A:i,j2 },... ,{ii,ki},{ki,ii}}. 


First note that 


E 


II + 


= 0 


if the walk ii,ki,i 2 ,k 2 , ■ ■ ■ AhhAi does not cross each edge e G E at least twice. 
For \E\ < I, we have 



< „2i-2|i5| /-|i5| 

— 'In 5 


where equality holds for \E\ = 1. Since G is connected, we conclude |1^| —1 < \E\ < 1. 
This implies that only those indices (ii,... ,kiAi) contribute asymptotical to the 
sum (3.40) for which \V\ — 1 = \E\ = I, and therefore the corresponding graphs G 
need to be trees. Hence, by Section 2 it remains to consider the sum over canonical 
walks ii,..., fc/, of d-banded ordered trees in Bp n d i+i- We conclude by Lemma 
2 . 2 , 


lim Em. 

p—>-oo 


'P,l 


— lim j n,fZ,/+i| 

p^oo pn^ 


= lim ^ 

p^oo 




'>IJ]^(deg(r)d,deg(fc*),2d) 


Gi^‘=,k■=) 
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deg(fc*) 


■ fdeg{k*)\ fdeg{k*)d — 2 jd — 1 


= E ^ (_i). 

G(iSfeO fc* J=0 

deg(fc*) 

= E n E l{fc*>2j}(-l)^(?/(deg(r-2j)))'i^s('=*)-i^ 

G(i=,fc‘=) fe* i=o 


V j J \ deg{k*) - 1 

deg(fc*) 


j!(deg(fc*) - j)!’ 


where the outer sum runs over all canonical ordered trees fc'^) and the product 
runs over all vertices k* S {fcf,..., kf}. Note that the cardinality of {/c^,..., kf} 
depends on the underlying canonical ordered tree and is given by maxg^i^ ..^/ k^. 
Lastly, we use once again the lemma of Borel-Cantelli to prove that mpj—E,mp^i —>■ 0 
almost surely as p —>■ oo. Therefore, we evaluate the fourth moment of rup^i — Mnip^i. 
We follow the line of reasoning in Bai and Silverstein (2010) on page 30 and 31. 
First, rewrite 


4 

E (TOp^; - ^ E /c^]), 

j=i 


where for any j = 1, 2, 3,4, we denote 

% := (^1 jd ■ • •, ki,j) e [n]‘ and ij := {iij,iij) e [p]' 
such that \isj — is+i,j\ < c? for s = 1 ,..., / with ii+ij := iij, and 


i 

X[ij,kj] = ■ 

Again, we assume the indices isj to lie on the /-line and kgj on the //-line. Then 
for fixed {ij,kj), j = 1,..., 4, define the graphs Gj with vertex sets 

Vj := {iij ,. ■., iij} + ■ • ■) 


and edge sets 


Ej kij }, {f2j , fcij},..., {bj, kij}{iij,kij}}, 

and G with vertex set V := Vi U V 2 U V 3 U V 4 and edge set E := EiU E 2 LS E^LS E 4 . 
Now observe that 

4 

E n %]) = 0 

i=i 

if one of the graphs Gj has no common edge with any of the other three, or if one 
edge e G E occurs only once in the sequence 

O- - = {*1,11 fcl,l}: ■ • ■ ; {b,l; hs}, {*1,1: h,l}^ 

1 * 1 , 2 : ^ 1 , 2 }: {* 2 , 2 : ^ 1 , 2 }: ■ • ■ : {b, 2 ; ^ 1 , 2 }, {* 1 , 2 : ^(, 2 }: 

{*1,3: ^1,3}: {*2,3: ^1,3}: ■ • ■ : {b,3; kl , 3 }, {*1,1: ^(,3}: 
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{*l,4j ^ 1 . 4 }: {*2,4, fcl. 4 }, ■ • ■ , {*i,4, ki,i}^ 1*1.4, ki,4}- 

We conclude that G consists of at most two connected components, and each edge 
of a connected component occurs twice in a. In particular, \V\ < \E\ + 2. Denote 
the edges in £1 by ei,..., e\E\ and by vi,..., v\e\ the corresponding multiplicities 
of the edges in the sequence a. Then, 




E n (^[*44 ^4] - e^[* 4 > %]) 
4=1 


^ —4 — 4 ; I'lH-l-‘^|E[ — 2 |iJ| ? 

< 16p n rjn n 2 




The number of indices {ij,kj), j = 1,...,4 such that the graph G has at most 
two connected components, |i?| = s, s = 1,..., 21, and \V\ < s + 2 is bounded by 
Gip^n’^, where the constant Gi > 0 does only depend on I and sup^d/n < 00 , and 
may be chosen uniformly over all s = 1,..., 21. Alltogether, 

4 

E {nipj — Errip^i)* = ^ E {X[ij,kj] — EX[ij,kj]) 

(b-fcj), 4 = 1 ,■■•,4 4=1 

< 16 ^ 

s<2l 

< 52lGip-‘^. 


The last expression is summable over p, and therefore 


rup^i 


mi 


almost surely as p —)■ 00 . 


□ 
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